151 research outputs found
Entity-Oriented Search
This open access book covers all facets of entity-oriented searchâwhere âsearchâ can be interpreted in the broadest sense of information accessâfrom a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)âa process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms
Conversational AI from an Information Retrieval Perspective: Remaining Challenges and a Case for User Simulation
Conversational AI is an emerging field of computer science that engages multiple research communities, from information retrieval to natural language processing to dialogue systems. Within this vast space, we focus on conversational informa tion access, a problem that is uniquely suited to be addressed by the information retrieval community. We argue that despite the significant research activity in this area, progress is mostly limited to component-level improvements. There remains a disconnect between current efforts and truly conversational information access systems. Apart from the inherently chal lenging nature of the problem, the lack of progress, in large part, can be attributed to the shortage of appropriate evaluation methodology and resources. This paper highlights challenges that render both offline and online evaluation methodologies unsuitable for this problem, and discusses the use of user simulation as a viable solution.publishedVersio
Towards Building a Knowledge Base of Monetary Transactions from a News Collection
We address the problem of extracting structured representations of economic
events from a large corpus of news articles, using a combination of natural
language processing and machine learning techniques. The developed techniques
allow for semi-automatic population of a financial knowledge base, which, in
turn, may be used to support a range of data mining and exploration tasks. The
key challenge we face in this domain is that the same event is often reported
multiple times, with varying correctness of details. We address this challenge
by first collecting all information pertinent to a given event from the entire
corpus, then considering all possible representations of the event, and
finally, using a supervised learning method, to rank these representations by
the associated confidence scores. A main innovative element of our approach is
that it jointly extracts and stores all attributes of the event as a single
representation (quintuple). Using a purpose-built test set we demonstrate that
our supervised learning approach can achieve 25% improvement in F1-score over
baseline methods that consider the earliest, the latest or the most frequent
reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital
Libraries (JCDL '17), 201
Ad Hoc Table Retrieval using Semantic Similarity
We introduce and address the problem of ad hoc table retrieval: answering a
keyword query with a ranked list of tables. This task is not only interesting
on its own account, but is also being used as a core component in many other
table-based information access scenarios, such as table completion or table
mining. The main novel contribution of this work is a method for performing
semantic matching between queries and tables. Specifically, we (i) represent
queries and tables in multiple semantic spaces (both discrete sparse and
continuous dense vector representations) and (ii) introduce various similarity
measures for matching those semantic representations. We consider all possible
combinations of semantic representations and similarity measures and use these
as features in a supervised learning model. Using a purpose-built test
collection based on Wikipedia tables, we demonstrate significant and
substantial improvements over a state-of-the-art baseline.Comment: The web conference 2018 (WWW'18
Semantic Answer Type Prediction using BERT: IAI at the ISWC SMART Task 2020
This paper summarizes our participation in the SMART Task of the ISWC 2020 Challenge. A particular question we are interested in answering is how well neural methods, and specifically transformer models, such as BERT, perform on the answer type prediction task compared to traditional approaches. Our main finding is that coarse-grained answer types can be identified effectively with standard text classification methods, with over 95% accuracy, and BERT can bring only marginal improvements. For fine-grained type detection, on the other hand, BERT clearly outperforms previous retrieval-based approaches.publishedVersio
Towards Filling the Gap in Conversational Search: From Passage Retrieval to Conversational Response Generation
Research on conversational search has so far mostly focused on query
rewriting and multi-stage passage retrieval. However, synthesizing the top
retrieved passages into a complete, relevant, and concise response is still an
open challenge. Having snippet-level annotations of relevant passages would
enable both (1) the training of response generation models that are able to
ground answers in actual statements and (2) the automatic evaluation of the
generated responses in terms of completeness. In this paper, we address the
problem of collecting high-quality snippet-level answer annotations for two of
the TREC Conversational Assistance track datasets. To ensure quality, we first
perform a preliminary annotation study, employing different task designs,
crowdsourcing platforms, and workers with different qualifications. Based on
the outcomes of this study, we refine our annotation protocol before proceeding
with the full-scale data collection. Overall, we gather annotations for 1.8k
question-paragraph pairs, each annotated by three independent crowd workers.
The process of collecting data at this magnitude also led to multiple insights
about the problem that can inform the design of future response-generation
methods. This is an extended version of the article published with the same
title in the Proceedings of CIKM'23.Comment: Extended version of the paper that appeared in the Proceedings of the
32nd ACM International Conference on Information and Knowledge Management
(CIKM '23
Target Type Identification for Entity-Bearing Queries
Identifying the target types of entity-bearing queries can help improve
retrieval performance as well as the overall search experience. In this work,
we address the problem of automatically detecting the target types of a query
with respect to a type taxonomy. We propose a supervised learning approach with
a rich variety of features. Using a purpose-built test collection, we show that
our approach outperforms existing methods by a remarkable margin. This is an
extended version of the article published with the same title in the
Proceedings of SIGIR'17.Comment: Extended version of SIGIR'17 short paper, 5 page
- âŠ